expert behavior
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Instructional Material > Course Syllabus & Notes (0.86)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- North America > United States (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Chain of Thought Imitation with Procedure Cloning
Imitation learning aims to extract high-performance policies from logged demonstrations of expert behavior. It is common to frame imitation learning as a supervised learning problem in which one fits a function approximator to the input-output mapping exhibited by the logged demonstrations (input observations to output actions). While the framing of imitation learning as a supervised input-output learning problem allows for applicability in a wide variety of settings, it is also an overly simplistic view of the problem in situations where the expert demonstrations provide much richer insight into expert behavior. For example, applications such as path navigation, robot manipulation, and strategy games acquire expert demonstrations via planning, search, or some other multi-step algorithm, revealing not just the output action to be imitated but also the procedure for how to determine this action. While these intermediate computations may use tools not available to the agent during inference (e.g., environment simulators), they are nevertheless informative as a way to explain an expert's mapping of state to actions. To properly leverage expert procedure information without relying on the privileged tools the expert may have used to perform the procedure, we propose procedure cloning, which applies supervised sequence prediction to imitate the complete series of expert computations. This way, procedure cloning learns not only what to do (i.e., the output action), but how and why to do it (i.e., the procedure). Through empirical analysis on navigation, simulated robotic manipulation, and game-playing environments, we show that imitating the intermediate computations of an expert's behavior enables procedure cloning to learn policies exhibiting significant generalization to unseen environment configurations, including those configurations for which running the expert's procedure directly is infeasible.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Imitation Learning in the Deep Learning Era: A Novel Taxonomy and Recent Advances
Chrysomallis, Iason, Chalkiadakis, Georgios
Imitation learning (IL) enables agents to acquire skills by observing and replicating the behavior of one or multiple experts. In recent years, advances in deep learning have significantly expanded the capabilities and scalability of imitation learning across a range of domains, where expert data can range from full state-action trajectories to partial observations or unlabeled sequences. Alongside this growth, novel approaches have emerged, with new methodologies being developed to address longstanding challenges such as generalization, covariate shift, and demonstration quality. In this survey, we review the latest advances in imitation learning research, highlighting recent trends, methodological innovations, and practical applications. We propose a novel taxonomy that is distinct from existing categorizations to better reflect the current state of the IL research stratum and its trends. Throughout the survey, we critically examine the strengths, limitations, and evaluation practices of representative works, and we outline key challenges and open directions for future research.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (16 more...)
- Health & Medicine (1.00)
- Education (1.00)
- Information Technology > Security & Privacy (0.68)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Chen, Yang, Zou, Menglin, Zhang, Jiaqi, Zhang, Yitan, Yang, Junyi, Gendron, Gael, Zhang, Libo, Liu, Jiamou, Witbrock, Michael J.
Inverse Reinforcement Learning (IRL) learns a reward function to explain expert demonstrations. Modern IRL methods often use the adversarial (minimax) formulation that alternates between reward and policy optimization, which often lead to unstable training. Recent non-adversarial IRL approaches improve stability by jointly learning reward and policy via energy-based formulations but lack formal guarantees. This work bridges this gap. We first present a unified view showing canonical non-adversarial methods explicitly or implicitly maximize the likelihood of expert behavior, which is equivalent to minimizing the expected return gap. This insight leads to our main contribution: Trust Region Reward Optimization (TRRO), a framework that guarantees monotonic improvement in this likelihood via a Minorization-Maximization process. We instantiate TRRO into Proximal Inverse Reward Optimization (PIRO), a practical and stable IRL algorithm. Theoretically, TRRO provides the IRL counterpart to the stability guarantees of Trust Region Policy Optimization (TRPO) in forward RL. Empirically, PIRO matches or surpasses state-of-the-art baselines in reward recovery, policy imitation with high sample efficiency on MuJoCo and Gym-Robotics benchmarks and a real-world animal behavior modeling task.
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (3 more...)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Research Report > New Finding (0.67)
- Instructional Material > Course Syllabus & Notes (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Diffusion Policies with Offline and Inverse Reinforcement Learning for Promoting Physical Activity in Older Adults Using Wearable Sensors
Liu, Chang, Thiamwong, Ladda, Fu, Yanjie, Xie, Rui
Utilizing offline reinforcement learning (RL) with real-world clinical data is getting increasing attention in AI for healthcare. However, implementation poses significant challenges. Defining direct rewards is difficult, and inverse RL (IRL) struggles to infer accurate reward functions from expert behavior in complex environments. Offline RL also encounters challenges in aligning learned policies with observed human behavior in healthcare applications. To address challenges in applying offline RL to physical activity promotion for older adults at high risk of falls, based on wearable sensor activity monitoring, we introduce Kolmogorov-Arnold Networks and Diffusion Policies for Offline Inverse Reinforcement Learning (KANDI). By leveraging the flexible function approximation in Kolmogorov-Arnold Networks, we estimate reward functions by learning free-living environment behavior from low-fall-risk older adults (experts), while diffusion-based policies within an Actor-Critic framework provide a generative approach for action refinement and efficiency in offline RL. We evaluate KANDI using wearable activity monitoring data in a two-arm clinical trial from our Physio-feedback Exercise Program (PEER) study, emphasizing its practical application in a fall-risk intervention program to promote physical activity among older adults. Additionally, KANDI outperforms state-of-the-art methods on the D4RL benchmark. These results underscore KANDI's potential to address key challenges in offline RL for healthcare applications, offering an effective solution for activity promotion intervention strategies in healthcare.
- North America > United States > Florida > Orange County > Orlando (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning
Malomgré, Elias, Simoens, Pieter
Recent trends in Reinforcement Learning (RL) highlight the need for agents to learn from reward-free interactions and alternative supervision signals, such as unlabeled or incomplete demonstrations, rather than relying solely on explicit reward maximization. Additionally, developing generalist agents that can adapt efficiently in real-world environments often requires leveraging these reward-free signals to guide learning and behavior. However, while intrinsic motivation techniques provide a means for agents to seek out novel or uncertain states in the absence of explicit rewards, they are often challenged by dense reward environments or the complexity of high-dimensional state and action spaces. Furthermore, most existing approaches rely directly on the unprocessed intrinsic reward signals, which can make it difficult to shape or control the agent's exploration effectively. We propose a framework that can effectively utilize expert demonstrations, even when they are incomplete and imperfect. By applying a mapping function to transform the similarity between an agent's state and expert data into a shaped intrinsic reward, our method allows for flexible and targeted exploration of expert-like behaviors. We employ a Mixture of Autoencoder Experts to capture a diverse range of behaviors and accommodate missing information in demonstrations. Experiments show our approach enables robust exploration and strong performance in both sparse and dense reward environments, even when demonstrations are sparse or incomplete. This provides a practical framework for RL in realistic settings where optimal data is unavailable and precise reward control is needed.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Domain-Invariant Per-Frame Feature Extraction for Cross-Domain Imitation Learning with Visual Observations
Kim, Minung, Lee, Kawon, Kim, Jungmo, Choi, Sungho, Han, Seungyul
Imitation learning (IL) enables agents to mimic expert behavior without reward signals but faces challenges in cross-domain scenarios with high-dimensional, noisy, and incomplete visual observations. To address this, we propose Domain-Invariant Per-Frame Feature Extraction for Imitation Learning (DIFF-IL), a novel IL method that extracts domain-invariant features from individual frames and adapts them into sequences to isolate and replicate expert behaviors. We also introduce a frame-wise time labeling technique to segment expert behaviors by timesteps and assign rewards aligned with temporal contexts, enhancing task performance. Experiments across diverse visual environments demonstrate the effectiveness of DIFF-IL in addressing complex visual tasks.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
- Information Technology > Data Science > Data Mining > Feature Extraction (0.61)